Fusing language information from diverse data sources for phonotactic language recognition
نویسندگان
چکیده
The baseline approach in building phonotactic language recognition systems is to characterize each language by a single phonotactic model generated from all the available languagespecific training data. When several data sources are available for a given target language, system performance can be improved using language source-dependent phonotactic models. In this case, the common practice is to fuse language source information (i.e., the phonotactic scores for each language/source) early (at the input) to the backend. This paper proposes to postpone the fusion to the end (at the output) of the backend. In this case, the language recognition score can be estimated from well-calibrated language source scores. Experiments were conducted using the NIST LRE 2007 and the NIST LRE 2009 evaluation data sets with the 30s condition. On the NIST LRE 2007 eval data, a Cavg of 0.9% is obtained for the closed-set task and 2.5% for the open-set task. Compared to the common practice of early fusion, these results represent relative improvements of 18% and 11%, for the closed-set and open-set tasks, respectively. Initial tests on the NIST LRE 2009 eval data gave no improvement on the closedset task. Moreover, the Cllr measure indicates that language recognition scores estimated by the proposed approach are better calibrated than the common practice (early fusion).
منابع مشابه
Towards High Performance Phonotactic Feature for Spoken Language Recognition
With the demands of globalization, multilingual speech is increasingly common in conversational telephone speech, broadcast news and internet podcasts. Therefore, automatic spoken language recognition has become an important technology in multilingual speech related applications. For example, automatic spoken language recognition has been used as a preprocessing component for spoken language tr...
متن کاملImproving phonotactic language recognition with acoustic adaptation
In recent evaluations of automatic language recognition systems, phonotactic approaches have proven highly effective [1][2]. However, as most of these systems rely on underlying ASR techniques to derive a phonetic tokenization, these techniques are potentially susceptible to acoustic variability from non-language sources (i.e. gender, speaker, channel, etc.). In this paper we apply techniques f...
متن کاملA Language Independent Approach To Acquiring Phonotactic Resources for Speech Recognition
Building and developing linguistic resources for languages is of prime importance with many areas of application. This paper focusses on a fully automatic approach to the aquisition of a syllable phonotactics for a particular language. In this approach the phonotactic constraints for a language are encoded in a finite-state phonotactic automaton the structure of which can be automatically deriv...
متن کاملEffective Arabic Dialect Classification Using Diverse Phonotactic Models
We study the effectiveness of recently developed language recognition techniques based on speech recognition models for the discrimination of Arabic dialects. Specifically, we investigate dialect-specific and cross-dialectal phonotactic models, using both language models and support vector machines (SVMs). Techniques are evaluated both alone and in combination with a cepstral system with joint ...
متن کاملUsing cross-decoder co-occurrences of phone n-grams in SVM-based phonotactic language recognition
Most common approaches to phonotactic language recognition deal with several independent phone decoders. Decodings are processed and scored in a fully uncoupled way, their time alignment (and the information that may be extracted from it) being completely lost. Recently, we have presented a new approach to phonotactic language recognition which takes into account time alignment information, by ...
متن کامل